Search CORE

12 research outputs found

How Hard is Counting Triangles in the Streaming Model

Author: A. Rinaldo
C. Tsourakakis
H. Jowhari
J. Eckmann
M. Alon
M. Kolountzakis
O. Frank
Publication venue
Publication date: 01/01/2013
Field of study

The problem of (approximately) counting the number of triangles in a graph is one of the basic problems in graph theory. In this paper we study the problem in the streaming model. We study the amount of memory required by a randomized algorithm to solve this problem. In case the algorithm is allowed one pass over the stream, we present a best possible lower bound of

\Omega(m)

for graphs

G

with

m

edges on

n

vertices. If a constant number of passes is allowed, we show a lower bound of

\Omega(m/T)

T

the number of triangles. We match, in some sense, this lower bound with a 2-pass

O(m/T^{1/3})

-memory algorithm that solves the problem of distinguishing graphs with no triangles from graphs with at least

T

triangles. We present a new graph parameter

\rho(G)

-- the triangle density, and conjecture that the space complexity of the triangles problem is

\Omega(m/\rho(G))

. We match this by a second algorithm that solves the distinguishing problem using

O(m/\rho(G))

-memory

arXiv.org e-Print Archive

Crossref

Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning

Author: A. Hajnal
A. Magen
C. Papadimitriou
D. Knuth
F. Chung
H. Chernoff
H. Jowhari
J. Feigenbaum
J.H. Kim
M. Latapy
N. Alon
O. Frank
S. Wasserman
T. Schank
T. Schank
V.H. Vu
W. Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The number of triangles is a computationally expensive graph statistic which is frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model) and in important real world applications such as spam detection, uncovering of the hidden thematic structure of the Web and link recommendation. Counting triangles in graphs with millions and billions of edges requires algorithms which run fast, use small amount of space, provide accurate estimates of the number of triangles and preferably are parallelizable. In this paper we present an efficient triangle counting algorithm which can be adapted to the semistreaming model. The key idea of our algorithm is to combine the sampling algorithm of Tsourakakis et al. and the partitioning of the set of vertices into a high degree and a low degree subset respectively as in the Alon, Yuster and Zwick work treating each set appropriately. We obtain a running time

O \left(m + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right)

and an

\epsilon

approximation (multiplicative error), where

n

is the number of vertices,

m

the number of edges and

\Delta

the maximum number of triangles an edge is contained. Furthermore, we show how this algorithm can be adapted to the semistreaming model with space usage

O\left(m^{1/2}\log{n} + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right)

and a constant number of passes (three) over the graph stream. We apply our methods in various networks with several millions of edges and we obtain excellent results. Finally, we propose a random projection based method for triangle counting and provide a sufficient condition to obtain an estimate with low variance.Comment: 1) 12 pages 2) To appear in the 7th Workshop on Algorithms and Models for the Web Graph (WAW 2010

arXiv.org e-Print Archive

CiteSeerX

Crossref

Graph Sketching

Author: H Jowhari
KJ Ahn
M Kapralov
S Bhattacharya
S Guha
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Approximate Counting of Cycles in Streams

Author: H. Jowhari
J. Flum
L.S. Buriol
N. Karmarkar
S. Chien
S. Ganguly
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Subgraph counting is a fundamental problem in algorithm design and has many applications in data mining, biology, social networks, and many other domains. Over the past years this problem has been studied extensively from a theoretical point of view. Because of the intensive computational resources required, traditional algorithms are infeasible even for medium sized graphs. A natural way to address this problem in a massive graph is to use the data streaming model, where edges arrive in an arbitrary order and the algorithm is required to use limited memory to approximate the number of subgraphs. Prior to our work, most subgraph counting algorithms are based on edge sampling. In this paper we develop a novel approach for counting cycles of an arbitrary but fixed size in the turnstile model, i. e., the input stream is a sequence of edge insertions and deletions. Our algorithm is based on the idea of computing instances of complex-valued random variables over the given stream, and improves drastically upon the naïve sampling algorithms. In contrast to most existing approaches, our algorithm can also be easily applied in the distributed setting. We believe that the idea of using complex-valued random variables will find further applications, in particular with respect to also counting more general subgraphs

CiteSeerX

Crossref

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

Explore Bristol Research

Counting arbitrary subgraphs in data streams

Author: H. Jowhari
L.S. Buriol
M. Gonen
M. Manjunath
M.E.J. Newman
N. Alon
R. Milo
R. Pagh
S. Ganguly
Publication venue
Publication date: 01/01/2012
Field of study

Abstract. We study the subgraph counting problem in data streams. We provide the first non-trivial estimator for approximately counting the number of occurrences of an arbitrary subgraph H of constant size in a (large) graph G. Our estimator works in the turnstile model, i.e., can handle both edge-insertions and edge-deletions, and is applicable in a distributed setting. Prior to this work, only for a few non-regular graphs estimators were known in case of edge-insertions, leaving the problem of counting general subgraphs in the turnstile model wide open. We further demonstrate the applicability of our estimator by analyzing its concentration for several graphs H and the case where G is a power law graph

CiteSeerX

Crossref

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

Explore Bristol Research

Approximately counting triangles in large graph streams including edge duplicates with a fixed memory usage

Author: Bar-Yossef Z.
Bar-Yossef Z.
Berry J. W.
Flajolet P.
Jowhari H.
Schank T.
Seshadhri C.
Welser H. T.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

Annotations in Data Streams

Author: A. Razborov
A. Shamir
C. Demetrescu
C. Lund
F. Ablayev
G. Cormode
H. Jowhari
J. Feigenbaum
M. Charikar
N. Alon
R. Freivalds
T. Kimbrel
W. Johnson
Publication venue
Publication date: 01/01/2009
Field of study

The central goal of data stream algorithms is to process massive streams of data using sublinear storage space. Motivated by work in the database community on outsourcing database and data stream processing, we ask whether the space usage of such algorithms be further reduced by enlisting a more powerful “helper ” who can annotate the stream as it is read. We do not wish to blindly trust the helper, so we require that the algorithm be convinced of having computed a correct answer. We show upper bounds that achieve a non-trivial tradeoff between the amount of annotation used and the space required to verify it. We also prove lower bounds on such tradeoffs, often nearly matching the upper bounds, via notions related to Merlin-Arthur communication complexity. Our results cover the classic data stream problems of selection, frequency moments, and fundamental graph problems such as triangle-freeness and connectivity. Our work is also part of a growing trend — including recent studies of multi-pass streaming, read/write streams and randomly ordered streams — of asking more complexity-theoretic questions about data stream processing. It is a recognition that, in addition to practical relevance, the data stream model raises many interesting theoretical questions in its own right.

CiteSeerX

Crossref